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SUMMARY 


The  relationship  of  minimum  distance  (MD)  estimation  to  other  methods 
of  estimation  is  considered.  M -estimation  is  viewed  as  a  special  case, 
with  interesting  interpretations  in  terms  of  the  defining  i|>  -  function 
as  related  to  components  of  goodness-of-fit  statistics  and  modified 
Fourier  approximations  to  the  efficient  score.  Applications  to  the  compo¬ 
site  and  simple  goodness-of-fit  problems  are  considered. 

Portions  of  this  research  were  supported  under  ONR  Contract  N00014-75-C-0439. 
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1.  Introduction,  Definitions,  and  Consistency 

Robust  estimation  has  received  much  attention  in  recent  statistical 
literature,  with  a  comprehensive  survey  given  by  Huber  (1977).  The 
problem  considered  is  as  follows;  a  random  sample  X^,  . ..,  XQ  is  observed 
from  some  unknown  distribution  G,  where  it  is  presumed  (although  not 
necessarily  true)  that  G  e  T  *  {F^,  8  s  a},  where  the  model  T  is  a 
parametrized  family  of  distribution  functions.  The  goal  of  robust  estima¬ 
tion  is  to  estimate  9  with  an  estimator  T[Gq],  such  that  T  is  nearly 
fully  efficient  when  G  e  r,  i.e.  when  the  model  is  correct,  and  which 
estimates  a  meaningful  quantity  with  reasonable  efficiency  when  G  i  T, 
but  G  is  close  to  r  in  an  appropriate  topology  on  the  space  of  distribu¬ 
tion  functions . 

Minimum  distance  estimation  was  first  subjected  to  comprehensive 

study  in  a  series  of  papers  culminating  in  Wolfowitz  (1957)  .  and  has 

since  been  considered  as  a  method  for  deriving  robust  estimators  by 

Knlisel  (1969)  and  Farr  and  Schucany  (1980).  An  extensive  bibliography 

is  given  by  Parr  (1980) .  The  basic  philosophy  of  minimum  distance  (MD) 

estimation  is  to  match  the  empirical  distribution  function  G  to  an 

a 

element,  F^,  of  the  model  r  as  closely  as  possible.  Thus,  for  a  suitably 
chosen  "distance  function"  6(.,.)  measuring  the  discrepancy  between 
two  distribution  functions,  an  MD-estlmator  of  9  based  on  G  and  with 

Q 

respect  to  the  model  T  and  the  discrepancy  <S(.,.)  is  given  by  a  value  T 
such  that 

(1)  $(Ga,FT)  -  inf  «(Gq,F0)  . 


Due  Co  possible  aoauaiqueaess  of  Che  value  T  achieving  Che  infimum  or 
Co  aoaattaiaabilicy  of  Che  infimum  in  I* ,  we  are  forced  for  generality 
Co  Che  following  definition. 

Definition.  A  sequence  of  random  variables  {TQ)a-^  ^  a  sequence 

CO 

of  asymptotic  minimum  distance  estimators  based  on  {Gq}  ^  with 

respect  to  the  model  r  and  Che  discrepancy  6(.,.)  if  and  only  if 

i)  T  e  fl  for  all  n  >  1 
n  — 

and  ii)  there  exists  a  nonnegative  function  K(n)  with  11m  K(n)  -  0 
such  that  for  all  n  >  1 

(2)  $<G  .F_  )  <  inf  S(G.F  )  +  K(n)  . 

n  r  —  _  n  9 

a  deQ 

Some  natural  choices  for  the  discrepancy  <5(. , . )  would  include 
the  Kolmogorov  discrepancy,  1 

D(K,L)  -  sup  | K(x)  -  L(x) j  , 

the  Kuiper  discrepancy, 

V(K,L)  -  sup  |{K(b)  -  K(a) }  -  {L(b)  -  L(a)}|  , 

-»<a<b<® 

and  the  class  of  discrepancies  given  by 

(3)  H^b(K,L)  -  a /  {K(x)  -  L(x)}2dL(x)  +  b[/  {K(x)  -  L(x)}dL(x)]2  , 

considered  by  Sahler  (1970) ,  which  includes  the  Cramer-von  Mises  dis¬ 
crepancy  W2  for  a  -  1,  b  •  0;  the  Watson  D2  discrepancy  for  a  -  1, 

2 

b  •  -1;  and  the  Chapman  discrepancy,  C  ,  for  a  ■  0,  b  ■  1.  We  assume 
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here  and  henceforth  that  all  distribution  functions  in  T  are  absolutely 
continuous.  Actual  choice  of  which  discrepancy  to  use  for  a  specific 
situation  would  depend  upon  i)  which  aspects  of  the  sampled  population 
one  desires  to  match,  ii)  efficiency  considerations,  and  ill)  robust¬ 
ness  considerations.  The  connections  of  MD-estimation  with  other 
methods  discussed  should  provide  some  insight  into  the  trade-off  among 
those  competing  criteria. 

It  is  of  interest  to  determine  conditions  under  which  a  sequence 
of  asymptotic  MD  estimators  is  consistent.  The  following  theorem 
(a  generalization  of  Theorem  1  of  Parr  and  Schucany  (1980))  provides 
suitable  (if  somewhat  stringent)  .restrictions  on  T,  6(.,.)  ,  and  the 
sampling  situation. 

CO 

Theorem  1:  Let  (G  }  be  a  sequence  of  random  distribution  functions 

.  n  Q”l 

on  and  [T  }  .  be  a  sequence  of  asymptotic  MD-estimators  based 

on  (G  }  with  respect  to  T  »  {F.  ,0  e  0}  and  $(.,.)  .  If  the 
n  n**i  a 

following  hold: 

*»  «  ... 

i)  there  exists  a  metric  |  |*|  |  on  F  (where  F  is  the  . 

ii) 

iii) 

iv) 

then  Ta““*  ®0  with  probability  one. 


space  of  one-dimensional  distribution  functions)  such  that 

i  I G  -  G|  |  — *  0  with  probability  ooe  , 
n 

the  class  of  functions  {5(*,Fg)  ,  9  e  8}  is  equicontinuous 
at  G  (with  respect  to  the  metric  | | • | |)  » 

there  esdsts  a  point  9„  e  £2  such  that  $(G,F.  )  <  S(G,F  ) 

o  (J  O 

o 

for  9  i  9q,  9  e  £2 »  and 

for  any  sequence  {9  )..  of  elements  of  £2,  lim  5(G,F  )  * 

*  *  \ 
5(G,F0  )  implies  lim  9^  ■  9Q  , 
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Proof:  The  proof  is  trivial  and  hence  omitted. 


Notes: 

1)  Conditions  iii)  and  iv)  are  designed  to  Insure  uniqueness 
of  the  minimum  of  <5(G,Fg)  and  a  reasonable  parametri2ation 
of  T,  respectively. 

2)  Condition  i)  is  the  only  restriction  on  the  sampling  situation. 
While  it  is  easily  satisfied  for  "small"  choices  of  | ( * | |  , 
ii)  competes  by  being  easily  satisfied  by  "big"  ||*|j  •  A 
typical  choice  might  be  the  1  metric  (Kolmogorov  discrepancy) . 
For  such  a  choice  and  random  sampling,  if  f  is  a  translation 
family  with  9  the  translation  parameter  (Fg  (x)  »  Fq(x  -  9) 

for  all  (x,9)  and  G  e  T),  the  conditions  are  satisfied  for 
all  discrepancies  mentioned  in  this  section.  They  are  also 
satisfied  for  the  above  discrepancies  when  G  i  T  if  iii) 
and  iv)  hold. 

3)  Condition  ii)  can  be  omitted  if  $(.,.)  is  a  metric  on 

F. 

4)  The  theorem  is  really  a  statement  of  continuity  of  the  functional 
T[Gfl]  *  Tfl  at  G  with"  respect  to  the  metric  (I'll  • 

5)  Condition  ii)  could  (at  a  sacrifice  of  simplicity)  clearly  be 

relaxed  to  requiring  equlcontinuity  of  the  6(*,Fg)  at  G  only 

for  9  in  a  neighborhood  U(9)  of  9.  ,  and  that 

o  o 

inf  5  (H,F. )  >  M  +  0(|  (H  -  G|l) 

9efl-U(9  ) 
o 

for  some  H  >  5(G,Fq  ) 


5 


The  statement  of  general  results  for  asymptotic  distribution 
theory  proves  to  be  much  less  succinct.  MD  estimators  divide  into  two 
basic  types:  1)  those  based  upon  "integral- type”  discrepancies  such 
as  k  or  weighted  versions  thereof,  and  2)  those  based  upon 
"sup-type”  discrepancies  such  as  D,V,  or  weighted  versions  thereof. 

The  first  type  are  asymptotically  normal  under  suitable  conditions  (Sahler 
(1970),  Parr  and  Schucany  (1980),  Parr  and  DeWet  (1979)  and  Boos  (1980)).,  while 
the  second  type  are  typically  not  asymptotically  normal  (Bolthausen  (1977)) 

even  in  the  simplest  and  smoothest  cases.  Littell  and  Rao  (1975), 
and  Pollard  (1980)  are  also  good  references  in  this  area. 

As  we  shall  see  in  the  following  sections,  frequency-domain 
analyses  of  MD-procedures  can  yield  a  great  amount  of  insight  into  the 
proper  choice  of  the  discrepancy  d(.,.)  ,  based  upon  the  competing 
goals  of  deriving  an  estimator  with  high  efficiency  when  G  e  T  and 
of  maintaining  robustness  when  G  i  T  .  The  first  criterion  (efficiency) 
will  require  high  fidelity  of  a  particular  tapered  Fourier  approximation 
to  the  efficient  score,  while  the  second  (robustness)  will  amount  to 
use  of  a  low-pass  filter  to  dampen  out  high  frequency  components  of 
the  same  approximate. 


2.  Components  of  Goodness-of-flt  Statistics 
and  MD  Estimation 

Durbin  and  Knott  (1972)  introduced  the  idea  of  interpreting  the 
2 

quantities  in  the  orthogonal  representation  of  the  Cramer-von 

Mis *s  statistic 


(4)  h: 


1,0 


as  components  representing  different  aspects  of  the  discrepancy  between 
and  F„ ,  where 

a  8 

(5)  Z ^  -  /(2n)jir  /  (gq(x)  -  Fg (x)^ sin|jirFQ(x)|  dx 
•  / (2/n)£  cos jjirFQ(Xt)^  . 

Here,  X^,  . X  is  a  random  sample  of  size  n  from  some  distribution 

2 

G  and  the  statistic  H,  .  is  being  used  to  test  whether  or  not  G  =  F 

If  0 

For  "smooth"  alternatives,  the  main  source  of  the  discrepancy  between 
G  and  F  should  be  in  the  first  few  components.  For  a  null  hypothesis 

fl  0 

of  Xi  standard  normal,  Durbin  and  Knott  found  that 

i)  Zn1  contained  most  of  the  information  about  pure  location 
shifts,  having  an  asymptotic  power  of  .93  against  contiguous 
location  changes  when  the  t-test  had  power  .95. 

ii)  Zq2  was  similarly  efficient  against  scale  changes, 

iii)  Z^  was  orthogonal  to  contiguous  scale  changes  and  Zq2 
orthogonal  to  contiguous  location  shifts. 

This  suggests  that  a  suitable  reweighting  of  the  components  in  (4) 
or  a  similar  test  might  result  in  a  higher  efficiency.  This  program  of 
study  is  carried  out  in  Schoenfeld  (1977).  To  discuss  such  extensions 
we  need  the  fallowing  notation: 

Let  (co(l),  ?^(1),  ...}  be  a  complete  or thonormal  basis  for  the 

space. of  square  integrable  functions  on  [0,1].  Require  ?g(u)  *  1, 

0  <  u  <  1  so  that  u)du  -  0,  i  i  0.  Let  ■  {FQ , 9  e  £1}  . 

0 

be  a  parametrized  family  of  distribution  functions  (called  the  "model") 

and  G  and  G  be  as  above.  Further  paralleling  (5) ,  define  the 
n 


random  functions 
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(6)  doj<9)  -  i  Cj(F9(Xi>)  . 


When  ?j(u)  «  /2  cos(jiru),  /n  daj(8)ia  Zqj  ,  the  jth  component  of 
the  Crame'r-von  Mises  statistic  as  in  (5) .  Thus 


(7)  Hf 


“d>> 


L  2  2 

j-1  jV 


in  this  special  case. 


More  generally,  it  is  expressed  as  a  weighted  sum  of  the  squared  d  .  (0). 

Since  the  5^(»)  will  usually  have  a  frequency-type  interpretation, 
different  weightings  of  the  squared  components  will  correspond  to  the 
creation  of  a  goodness-of-fit  test  sensitive  to  departures  from  the  null 
hypotheses  having  specific  frequency  interpretations.  As  we  shall  see 
from  the  following,  similar  interpretations  will  be  possible  for  MD- 
estlmators  related  to  these  tests. 

In  the  context  of  (possibly  robust)  minimum  distance  estimation, 
this  suggests  a  broadening  of  the  class  of  estimators.  Define  the  random 
functions  which  are  candidates  for  useful  new  discrepancy  measures 


between  ra  and  G  , 
a 


(8)  K(0,a;G  ) 

a 


I  *,<•  .(9) 


for  some  fixed  sequence  {a^}  such  that 

l  <  -  ’ 


and 

(9)  L(0,b;G  )  -  l  b.d2,(0) 

a  jil  J  nJ 

for  l  |b  |  <  -  . 

j-1  3 
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We  are  now  able  to  consider  MD-es timation  utilizing  the  discrep¬ 
ancies  K(a,a;G  )  .and  L(a,b;G  )  . 

n  n 

Case  1:  Estimators  minimizing  K(9,a;G  ) 

-  a 

K(a,a;Ga)  may  be  written  as  a  functional  of  Gq  in  the  form 
K(8,a;Ga)  -  [_/  Vj  (F9(x))  dGn(x)}  * 


Observe  that 


K(9,a;Fg)  ^  a j5j(F9(x>)  dF9 (x>j  *  0  » 


when  G  e  r,  the  estimand  is  a  root  of  the  equation 


so  that 


<10)  -1  Ji  (F9cac)) do(x> 


It  follows  that  the  value  T^G^.]  ,  which  minimizes  K(8,a;Gn).  as  a  function 
of  6.  is  (subject  to  the  conditions  of  Theorem  2)  a  root  of 


I  v  -° 


i.e.  that  T~[G  ]  is  an  M-estimator  with  defining  ^-function  given  by 

£X 


a 

(12)  ’KxjQ)  -  l  (F0^)  * 


Thus, 'the  usual  theory  for  ^-estimation  is  applicable. 

For  the  following  result  we  further  require  that  the  ^(*) 
individually  continuous  and  uniformly  bounded.  Such  orthonormal  bases 

2  V2 

for  L  [0,1]  do  exist,  such  as  5^  (u)  -  2  cos(jiru),  j  -  1,  2,  ...  . 

These  conditions  are  stronger  than  necessary,  but  serve  to  reduce  the 
mathematical  complexity  of  the  results. 
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Theorem  2:  If 


i)  l  |  a^  j  <  •  ,  and 


j-1 


ii) 


~  /  7  a. 5 . (f  (xS)  dG(x) I  is  finite  and  nonzero, 

30  _•  j-i  J  n  9  )  1 9-6 


where  3  is  a  root  of 
o 


•  • 

/  ajCj(F0(x)JdG(X)  -  0  , 


co 

then  there  exists  a  seouence  {T„[G  ]}  ,  of  roots  of  equation 

K  n  n*i 

(11)  such  that 


W' 


3q  with  probability  one. 


Proof:  The  result  is  a  simple  corollary  of  elementary  consistency  theorems 


for  M-estimators ,  since  i)  9q  is  an  isolated  root. 


is  continuous  and  bounded. 


CD 

and  ii)  £  a^^FQ  (x)^ 


Theorem  3:  If  in  addition  to  the  assumptions  of  Theorem  2, 


30  aj?j(Fe(x))  If 


is  uniformly  continuous  in  x,  then  any  sequence  {T_[G  ]}  satisfying 

&  H 


w 


9  also  satisfies 
o 


✓n(T_[G  1  -  0  ) - ^N(0,c2)  , 

&  n  o  & 


with 

(13) 


2  Eg[^(X;9o)] 


3U>(X:9)  ,  ~\\ 
3  39  ' 9  i 


Proof:  This  is  also  a  simple  consequence  of  standard  normality  theorems 


for  the  associated  M-estimator. 


32F0(x)  32F0(x) 


363x 


3x39 


almost  everywhere,  with 


Kote:  If  G  e  T  and 
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11m 


3Fg (x) 
39 


0  for  all  9 ,  then 


(14) 


where 


31nf0(x) 

39 


fg(x)dx 


is  the  jth  Fourier  coefficient  in  the  expansion  of  the  score 
function 

•* 

(15)  J(.u)  -  1- laf  (x)  |  -  [  C  C  (u)  * 

de  3  x»Fa  (u)  1-1  J  3 

9 

(The  expansion  is  valid  if  the  Fisher  information  is  finite.) 

For  notational  convenience,  the  derivative  of  a  function  with  respect 

to  its  argument  will  be  denoted  by  a  prime  ('),  its  derivative  with 

3F0 (x) 

respect  to  9  denoted  by  a  dot  (•)•  For  example,  Fg(x)  -  — ^0 —  and 

Fg(x)  -  fg(x).  Second  order  derivatives  with  respect  to  6  will  be 

denoted  by  two  dots  (••)•  Also,  we  write  £  (x)  -  lnfa(x).  Hence, 

9  9 


and  efficiency  of  is  related  to  the  ''correlation”  of  the  a  and  c 
sequences.  A  by  product  of  the  proof  is  the  fact  that 
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U7)  /n(W  -  eo  -  i  j  IC^  at)] 


where 


Ix  Vl(VW)) 


(18)  lc_  ,  (v)  -  J*± -  ' 

TS’FS  -  l  Vi 

j-1  3  2 

Note  further  chat,  for  the  case  of  an  unbounded  score  function,  full 

efficiency  of  the  estimator  is  inconsistent  with  absolute  summability 

of  the  a^  .  However,  when  the  a^  are  absolutely  summable,  we  have  that 

IC_  00  is  bounded  and  continuous,  and  hence  "IL[g  ]  is  robust  in 
T  v  &  n 

K  o 

o  m 

that  sense.  (In  fact,  £  a.C.(u)  is  uniformly  continuous.) 

j-1  3  3 


Case  2:  Estimators  minimizing  L(9,b;G  ) 

n 

Computing  the  influence  curve  for  T^.  yields 


U9)  ICT  (x) 


Ji  VjCV*1) 

*'c  1 :  L  sl  vi'(Vy)  vy)  dGW 

X  Vii**™) 

m  CO 

-*  *<c] 
j-1  3  3 


The  second  equality  is  true  with  as  defined  in  the  note  to  Theorem  A, 
under  the  conditions  to  be  stated  in  that  note. 

Now,  computing  the  influence  curve  for  T^,  we  obtain 

(20)  IC_  r(x)  -  - s - - -  , 

L’  %  vl<v 


where  e 


J<V  ■  E=fi<F80(»)  \#«]  • 
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Thus,  the  choice  of  b.(9  )  -  _  yields  locally  the  same  influence 

j  .o  «j<V 

curve  .for  the  two  methods  of  MD-estimation..'  In  fact,  the  following  stronger 
resuit  holds. 


Theorem  4:  Let  {^,  j  ■  0,1,...}  be  a  complete  orthonormal  basis  for 

2 

L  [0,1],  where  the  C.(*)  are  individually  continuous  and  uniformly 

CO  CD 

bounded.  Let  r,{L(0,b;G  )}  ,  ,  (T. (G  )}  .  ,  be  such  that 

a  n*i  L  n  n*i 

as  m 

{T.(G  )}  .  minimizes  {L(9,b,G_)}  ,  componentwise  and  0  minimizes 

L  Q.  Q*1  “  11*1  O 


t(0,b;G)  .  Then  if 

i)  T.  (G) - 3 

L  a 


9q  in  probability 


ii)  for  some  e  >  0,  6  e  (9  -  s,9  +  e)  implies 

o  o 


e  rK(vx)) 

0  LI  si  ~  J 

os 

IJbJ  <  -  .With 


where 


the  total  variation 


norm,  and 


is  a  continuous  function  of  9  at  0q  for  a  set  of  x 
having  G-probability  one  for  each  j ,  then 


where 


vv  ■ 


{  L  (9o,b;G)}^ 


is  assumed  finite. 


Hence,  subject  to  the  conditions  o€  there  two  theorems,  the  estimators 

Tw(Gn)  and  T.  (G  )  have  the  same  asymptotic  distribution  if 
K  L  n 

bJ(80)ej<9o)  "  aj  * 
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Proof:  The  proof  is  given  in  the  appendix. 


Notes: 


1)  If  G  ■  Fe  ,  or  more  generally  I^(Jq  (*))  dG(x)  ■  0  for 


• i  then 


2)  Condition  ii)  is  the  crucial  one  for  the  proof  of  this  result. 

It  is  not  necessary  as  can  be  seen  by  considering  the  weights 

b,  -  1/jV  and  C.(u)  •  2^cos(jiru)  ,  for  which  L(e,b,G_)  is 
J  J 

the  Crame'c-von  Mises  statistic,  the  desired  asymptotic  normality 

holding  in  the  location  case  if  G  c  T  and  the  population  density 

is  cube  integrable.  However,  li)  fails  in  this  case. 

Thus,  estimators  derived  from  minimizing  L(3,b;G  )  can  duplicate  the 

n 

behavior  of  those  derived  from  K(3,a;G  )  .  Examining  the  case  of 

n 

K(8,a;GQ)  in  (for  simplicity)  the  location  case,  we  have 


ict  j. 

K  8, 


l  Vj(F0  (w)) 

w  -  . 


-i  »,=, 

j-i j  3 


Thus,  if  we  have  the  ordered  according  to  a  "frequency”  idea,  i.e. 
perhaps  C^(u)  *  2^2COg(juu)  ,  we  see  that 

i)  Since  the  ?  are  uniformly  bounded,  each  is  continuous  on  [0,1], 

m  J 

an^  J"  F  uoifornly  continuous  (a 

j*l  J  K’  0 

o 

robustness  property) . 

ii)  The  extent  to  which  the  weights  a^  "taper  off"  as  will 

correspond  to  the  degree  of  differentiability  of  IC_  (x) f 

TK»F0 

o 

hmc.  to  th.  degree  to  which  TK  po.......  cdditiocd  robu.to.oo 

properties. 


determines  the  efficiency  of 


iii)  The  inner  product  f  T  a.C  A 

M«1  2 

I  I  e* 

3-1  J  j-i  J 

the  estimation  procedure.  Thus,  for  instance,  if  the  aj's  Caper 
off  fast  to  achieve  i)  and  ii)  and  in  doing  so  fail  to  maintain 
a  high  correlation  with  the  C^'s,  the  efficiency  of  the  estimator 
will  be  low. 

The  desire  to  perfectly  duplicate  the  high  frequency  aspects  of 

l  (x)  must  produce  non- robust  estimators  by  violating  i)  and  ii) 

to  achieve  iii).  Many  results  in  robustness  may  in  fact  be  viewed 

as  means  of  tapering  the  sequence  (a^  }  to  eliminate  or  minimize  high 

* 

frequency  components  of  1. (x)  while  maintaining  high  fidelity.  There- 
fore,  we  have  seen  that  both  M-  and  MD-estimation  may  be  linked  to  a 
(tapered)  Fourier  expansion  of  i,(x) 

Figures  1  and  2  illustrate  this  phenomenon.  They  give  the  truncated 


•  —1  *• 

Fourier  approximates  to  ifl(F a  (u))  of  the  form  )  C  C .  (u)  for 

99  j-1  i  J 

M  «  1,3,5, 7,  and  «.  (Tve  take  9-0  without  loss  of  generality.) 

(Only  u  >  .5  is  shown,  since  both  the  functions  and  their  approximates 

are  odd  in  Fg^(u),  and  hence  C2k-  0,  k-  1,2,  ...). 

For  the  normal  density,  inclusion  of  more  terms  (Increasing  M)  allows 
-1  *  -1 

greater  fidelity  to  $  (u)  »  £q(Fq  (u)),  at  the  price  of  increasing  the 

supremum  of  the  approximate.  For  the  Laplace  density,  the  added  terms 
improve  the  approximation  near  the  discontinuity  in  l  (Fg  (u))  - 
1  -  21 (u  <  .5),  at  the  expense  of  making  the  approximate's  derivatives 
larger.  Relative  efficiencies  attained  by  K-type  e^tidiators  using  the 
various  truncations  of  the  efficient  scores  are  given  in  Table  1. 


.5 


3.  Connections  of  Minimum  Distance 
to  Other  Estimation  Methods 


Several  interesting  connections  exist  between  MD  and  other  methods 

of  estimation.  Estimation  of  d  based  upon  minimizing  K(9,a;G  )  is 

o  n 

A 

easily  seen  to  be  equivalent  to  defining  the  estimator  9  to  be  a  root  of 

U 


l  a  d  (9  )  -  0  , 
3  nj  n 


°  k  (£  **(v*)}  * 0 


Hence,  this  MD  estimator  is  an  M-estimator  with  defining  ^-function 


»(,:6)  ■  viCv*5)  ■ 


as  noted  in  Section  2. 

Hodges  and  Lehmann  (1963)  obtained  robust  estimators  of  location 
via  the  "inversion”  of  rank  test.  For  a  random  sample  X^,  Xq 

from  G  and  any  value  <  9  <  «*  they  define  the  mirror  image  of  the 

sample  about  9  by  1^(9)  *  29  -  X^  i  ■  1,  ...,  n  .  If  the  distribu¬ 
tion  G  is  symmetric  about  9q,  then  the  (Y^(9q)}  have  the  same  distri¬ 
bution  as  the  {XiJ  .  Then,  caking  h  to  be  any  two-sample  rank  test 
statistic  for  shift  having  the  property  that  when  the  two  populations 
are  identical,  the  distribution  of  h  is  symmetric  about  some  value  u, 
they  define  a  rank  (R)  estimate  of  location  by 


9  -  [suptt.-h^ . Xn;  ^(9) . Yq(9)) 

+  inffgjh^,  ...,  X_;  Y,(9) . Y_<9))  <  y}]/2 
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Thus,  an  R-estimator  of  location  "inverts"  a  rank  test  in  the  sense 

A 

that  it  selects  as  an  estimator  that  value,  9  such  that  the  rank  test 
based  upon  the  statistic  h(X,Y(9))  finds  it  hardest  to  reject  the 
hypothesis  of  symmetry.  (A  similar  development  of  R-estimators 
Is  possible  from  one-sample  rank  tests.) 

Similarly,  in  an  obvious  fashion,  MD-estimators  invert  goodness- 
of-fit  tests.  This  similarity  provides  a  heuristic  method  for  choosing 
highly  efficient  MD-estimators.  In  R-estimation,  Hodges  and  Lehmann  (1963) 
found  that  rank  tests  possessing  high  power  against  location  shifts 
yielded,  upon  inversion,  extremely  efficient  R-estimators.  Similarly, 
minimization  of  a  goodness-of-fic  discrepancy,  which  is  highly  powerful 
against  location  shifts,  yields  a  MD-estimator  with  a  good  efficiency. 

This  motivation  could  in  fact  lead  to  the  fully  efficient  MD-estimators 
discussed  in  Section  2,  which  coincide  with  the  optimal  M-estimators. 

There  is,  however,  the  added  quirk  that  most  gcodness-of-fit  tests  are 
not  asymptotically  normal,  and  thus  the  formal  theory  developed  by 
Hodges  and  Lehmann  (1963)  does  not  directly  apply  to  the  inversion  of 
typical  goodness-of-fit  tests.  (But  see  Parr  and  DeWet  (1979)  and 
Boos  (1980)  for  development  of  optimum  weighting  schemes  for  MD- 
estimatlon.) 

As  a  method,  adaptive  estimation  is  somewhat  difficult  to  charac¬ 
terize.  The  spirit  of  the  method  as  developed  by  Hogg  (1974)  and  others 
is,  however,  straightforward  to  describe.  Consider  for  simplicity  the 
case  of  location  estimation  for  symmetric  populations.  The  statistician 
examines  a  characteristic  of  the  sample  data  which  measures,  perhaps, 
tailweight  (naively  kurtosis,  but  more  likely  one  of  the  subsequent 
tailweight  measures  discussed  in  Hogg  (1974)  which  involve  ratios  of 
scale  estimators  which  are  linear  functions  of  order  statistics) .  Based 


upon  Che  value  of  this  statistic,  an  estimator  is  chosen  from  a  (possibly 
infinite)  set  which  is  expected  to  perform  well  for  distributions  with 
tailweight  of  the  estimated  order.  Thus,  adaptive  estimation  is  a  two- 
step  process: 

1)  Based  upon  some  characteristic  of  the  data,  select  an 

estimator  which  is  believed  to  work  well  for  the  apparent 

class  of  parent  populations  (or,  otherwise  stated,  select  a 

model  which  appears  to  be  an  adequate  approximation  to  the  data) 

and  2)  Use  that  estimator  (or,  use  an  estimation  procedure  expected 

to  be  competitive  at  or  near  the  expected  model) . 

A  procedure  of  this  sort  arises  naturally  in  MD-estlmation  of  a 

location  parameter.  Instead  of  considering  r  -  (a  specific  location/ 

scale  family  of  distribution  functions}  and  minimizing  5(G  ,F  )  over 

n  o 

Fg  e  T  ,  we  could  as  well  let  T  *  1*^  U  U,  . . . ,  U  where  each 
is  a  distinct  location/ scale  family,  generated  perhaps  by  prototype 
t-distributions  with  v^,  v2,  ...»  vR  degrees  of  freedom.  Finding  an 
MD-estlmator  with  respect  to  5  and  T  is  precisely  equivalent  to  choosing 

the  sub-model  I*.  such  that  inf  5 (G  ,F)  is  the  smallest,  and  then 

1  f  eT  ” 

0  i 

using  the  MD-estlmator  with  respect  to  6  and  T  ^ .  Thus ,  the  MD-estimator 
adaptively  selects  the  closest  location/ scale  family  (in  the  sense  of  5) 
and  then  estimates  based  upon  a  projection  into  that  family. 
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4.  Goodness-of-fit  Tests 


In  this  section  we  consider  the  goodness-of-fit  problem  both  in  the 
simple  null  case  and  for  composite  nulls  when  the  estimator  of  the  un¬ 
known  parameters  possesses  an  asymptotically  linear  structure.  This  will 
enable  us  to  examine  simultaneous  model-dependent  MD-estitnation  and 
goodness-of-fit  tests. 


In  the  case  of  a  simple  null  hypothesis,  i.e.  testing  H  :  G  s  F. 

o  9c 

versus  H.:  G  #  F  '  ,  we  consider  as  test  statistics 
A  9o 


«i>  ((...tit,)  yaj<y} 


(22)  L(9o,s;Gn)  -  *jdn3*V  *  ®j  >  0  for  a11  i  • 

Schoenfeld  (1977)  examines  K(8  ,r;G  )  in  detail  for  7  r2  < 

on  —  < 

*  3-1  3 

under  Hq  and  under  contiguous  alternative  densities  of  the  form 


(23)  PaW  -  f^GO  +,i  +  ika(\(l,>)  • 

1  j  1 

with  /h  (u)du  <  «  and  |kQ(u)|<  m(u)  for  all  n  with  /m2(u)du  <  •  . 

0  o 

% 

(Actually  he  studies  the  signed  square  root  of  K(0  ,r;G  ).) 

o  n 

ia  derives  an  asymptotically  optimal  choice  of  the  r  to  be  of  the  fora 

J 

(24)  .  r°pt  -  /h(uK  (u)du,  j  -  1,2 .  Under  H  ,nK(d  ,r;G  )  is 

J  J  o  o  n 

of  course  asymptotically  distributed  as  a  chi-square  with  ona  degree  of 

freedom, when  divided  by  Jr2* 

j-1  ^ 

Under  the  uniform  boundedness  condition  on  the  C  (*)  ,  we  obtain 
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the  null  distribution  of  nL(6  ,s;G  )  as 

o  n 


nL(8  ,s;G  ) 
o  n 


■*  I  y;  • 

j-1  J  J 


where  the  are  lid  unit  normal  random  variables.  Under  alternatives 
of  the  fora  (23), 

.  1  ? 
oI*(90.s;Gq)  — ~  >  l  +  /  h(u)^(u)du}  . 

The  null  distribution  can  be  approxiaated  using  the  results  of  Solomon 
and  Stephens  (1977)  or,  most  simply,  Gregory  (1980,  p.  121). 

Results  on  the  asymptotic  power  of  tests  based  upon  statistics 
of  the  form  of  nL(8o,s;GQ)  can  also  be  easily  obtained  using  the 
results  of  Gregory  (1980) .  Let  two  discrepancies  based  on  different 
sequences  of  positive  weights  be  denoted  by 


(25) 

(26) 


nL(8  ,s;G  ) 
o  n 

nL(8  ,s*;G  ) 
o  n 


■  vv> 2 
•  i*!1*  W’ * 


and 


Also  let  j(i)  denote  the  index  of  the  ith  largest  s^ ,  i.e. 

Sj(l)  -  ®j (2)  -  * **  >  ®  an^  similarly  define  j  (i).  Let  n^(n^)  be  the 

multiplicity  of  che  ith  largest  distinct  value  in  {s^}((Sj$.  If  we 

denote  the  limiting  power  of  a  size  3  test  against  the  sequence  in 

(23)  using  nL(8  ,s;G  )  (nL(8  ,s*G  ))  by  p(o)  (p*(a>),  then  by 
w  a  o  n  n 

1  2  nl  , 

Theorem  2.5  of  Gregory  (1980),  with  a,  -  j  h(u)5 .  (u)du,  A  »  J  a*,  . 
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lim  p(a)/p*(a) 
a-*o 


0  ,  if  A  >  A  or 

*  * 

A  •  A  and  a,  <  n. 


if  A  >  A  or 

*  * 
A  «A  and 


“l  "  “l 


expi 


1  r  2  ,,  . 

2  J-  aj  (k)  3 j  (k)  7  (3j  (1)  "  9j  (k)  } 

.  i  _ 


I  *  a2*(k)s 

z  Imjj+rj  v  ' 


j*(k)/(sj*(l>  '  8j*(k))| 


*  * 
if  A  ■  A  and  n^  *  n^  . 


Several  observations  can  be  made  from  this  result  concerning  the  relative 

efficiency  of  these  tests. 

*  * 
i)  If  n^  *  n^  ■  1,  then  the  condition  A  >  A  is  just 

\  2  1  2  1  2 

aj*(l)  >  aj(l)’  1*a‘*  ^  h(u)C^iy(u)du}  >  {/  h(u)Cj(1)(u)du} 

Thus,  if  distinct  weights  si  are  to  be  used,  the  largest  weight 
should  be  given  to  the  with  the  largest  coefficient  in  the 
expansion 


ii) 


h(u)  *  l  t.C  (u)  ,  0  <  u  <  1 

j-1  2  2 

for  the  direction  of  the  alternatives  in  (23) . 

Ve  see  furthermore  from  this  that  the  ideal  basis 

(C. (u),  j  ■  1,  . ..}  would  have  h(u)  as  a  member.  Hence,  no 

J  m 


A 

test  of  the  form  n  }  s.d*  (0  )  will  be  as  efficient  as  a  test 

U  J  Bj  O 


of  the 


j-1  J  J 


unless  Sj  ■  0  for  j  >  2  tnd 


C^(u)  ■  h(u)  almost  everywhere  with  respect  to  Lebesque  measure. 


/ 


Hi)  Thus,  Is  general  tests  of  the  type  nL(9o,s;GQ)  are  a  poor 

choice  if  there  is  much  knowledge  concerning  the  likely  alterna¬ 
tives  to  be  encountered,  although  they  do  possess  (if  s^  >  0 
for  all  j )  the  desirable  omnibus  property  of  being  consistent 
against  alternatives  of  the  form  of  (23),  and  are  hence  appro¬ 
priate  for  poorly  specified  alternatives. 

Seldom,  however,  are  null  goodness-of-fit  hypotheses  simple.  Most 
involve  the  estimation  of  one  or  more  parameters.  Schoenfeld  (1979) 

A 

discusses  the  behavior  of  tests  of  the  form  of  nK(6  ,r;G  )  when  the 

tx  n 

estimation  is  asymptotically  first-order-efficient.  He  shows  that  if 

the  constants  {r^ ,  j  •  1,2,...}  .are  chosen  to  be  orthogonal  to  a 

specified  subspace  of  dimension  equal  to  that  of  0 ,  the  null  distribution 

of  nK(9  ,r -G  )  is.  under  some  regularity  conditions,  the  same  as  that 
n  *  n 

of  nK(0  ,r  ;G  )  . 

°  o 

Similarly,  let  0  be,  for  the  sake  of  simplicity,  one-dimensional, 

A 

and  let  0  be  sufficiently  regular  that,  when  sampling  from  F.  , 
n  n 


Here,  p(-;0  )  is  usually  termed  the  influence  curve  (see  Hampel  (1974)) 


of  0 


If  ♦(•;0  )  is  such  that 
o 


2(f*1  (u>;90) 


du  <  • 


a  Fourier  expansion  of  the  form 


it  then  possesses 
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(27)  *(?e*(u);0o')  *  aj?J(u)  * 

If  the  r.  are  such  that  a  r  3  0,  then  /n(9  -  9  )  and  nL(9  ,r;G  ) 

J  j  j  u  o  nn 

are  asymptotically  independent  and  n(L(9  ,r;G  )  -  L(9  ,r;G  )>  —JLa  q 

u  n  o  n  ’ 

permitting  use  of  the  critical  points  of  nL(9o»r;Gn)  in  conducting  the 
composite  hypothesis  tests.  This  also  makes  it  possible  to  estimate 
the  unknown  parameter  and  conduct  an  asymptotically  independent  and 
parameter-free  test  of  the  fit  of  the  proposed  model.  An  estimator  with 
an  odd  influence  curve  would  thus,  under  the  appropriate  smoothness  condi¬ 
tions,  be  asymptotically  independent  of  a  test  which  put  nonzero  weight 
only  on  the  even  components  (looking  for  tailweight  or  kurtosis  departures 
from  a  conjectured  symmetric  model). 

5.  Summary 

Minimum  distance  methods  provide  a  large  class  of  estimation  pro¬ 
cedures  possessing  interesting  analogies  to  other  estimation  methods. 

A  frequency  domain  analysis  can  suggest  which  aspects  of  the  efficient 
score  should  and  should  not  be  copied  to  balance  the  competing  criteria 
of  efficiency  and  robustness.  MD  estimation  leads  to  a  natural  class  of 
goodness-of-fit  tests,  and  provides  (once  again  via  frequency  domain 
insight)  a  general  method  for  asymptotically  parameter-free  goodness-of- 
flt  test  construction  in  the  composite  goodness-of-fit  problem. 
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Appendix 


We  present  a  brief  sketch  of  the  proof  of  Theorem  4.  Define 


h(t) 


L(t,b;G)  -  L(90,b;G) 

t  -  8. 


-  4t>b|G^-  for  C  8  . 

t  “  9.  O 


■  L(0  ,b;G)  for  t  -  0 
o  o 


Hence,  we  have 


V°nl 


-  e 


UTL(Cn],b:« 
o  '  h(TLi8n» 


Further  define  the  function 


H(G_) 


L(0Q,b;G) 


a  h(TL(Gn)) 


and  the  differential 


D(Gn  -  G) 


/  (x))d(Ga  -  G)(x) 

’  jiVl'V 


which  is  the  "average”  of  IC_  _(x)  over  the  data  points,  our  proposed 
.  tl,g 


linear  approximation  for  T^[Gq]  -  0^  . 


1 

4 


A2 


Using  che  above,  we  write 


T_  (G  )  -  0  -  H(G  )D(G„  -  G) 

L  n  o  n  n 


k*  &  <fv«j  w,aew  *JVv  Wiw  MicM 


-  e^  (0  Q)  /CjCFq  <x))d(G 


n  -  G>  <*>}  • 


*<V‘ 


This  can  be  shown  by  lengthy  algebra  (utilizing  repeatedly  the  fact  that 

ifljW  ),b;G  )  *  L(s0>b*»G>  *  °>  t0  be  e^ual  to 

l  K «  [  /’t.ff,  <*»  -  ',(Ft  [0  ]«»4Wn  -  G)W/  [G  1(*,)FT7[G 

])j»l  J  L-»  J  o  J  L  a  -»  L  n  L  n 


+  fz. (Fg  (x))d(G  -  G)  (x)/  {C  .  (F0  [G  ]^X^^*T  [G  }Cx)}dG(x)  J 

-.«■  J  o  *•  J  o  o  J  L  n  I»  n 


Consider  the  two  terms  inside  the  outer  brackets  separately.  The 
contribution  of  the  second  can  be  bounded  in  absolute  value  by 

oa 

2  I  |  b,  |  sup|  G  (x)  -  G(x)|  |k(Fe  (O)IL  . 

|h<VGB)7r.  J-l  J  x  0 

/U;(F8  (x) }F0  (x)  -  ;.(Ft  [g  j(x)}Ft  [g  j(x)|dG(x) 
mm  J  o  o  ^  l.1  a  Ll  a 


K 


<  2  sup|G  (x)  -  G(x) | 


h(T_ (G  )) 
u  n 


5,iV||svMl1’'  vw 


by  condition  ii)  and  known  properties  o£  the  Kolmogorov-Soirnov 
statistic.  The  contribution  of  the  first  term  is  disposed  of  similarly 
(using  condition  iii)),  giving 


tt  (G  )  -  9  -  H(G  )D(Gn  -  G)  -  o  (n‘72) 

l  n  o  n  n  p 


since  H(G )  £  1.  Hence,  by  the  Lindeberg-Levy  central  limit  theorem 


for  ild  summands. 


/n(T  (G  )  -  8  }  -  N(O,a‘(0  )), 
L  n  O  L  o 


Table  1 


Relative  Efficiences  of  Location  Estimators 
Based  on  Truncated  Fourier  Approximates 


Order  of  Approximation 


Normal  '  Laplace 
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